Element Brief Description
Code name foood
Project title Exploring the Relationship Between COVID-19 and Dietary Health
Authors
Affiliation INFO-201: Technical Foundations of Informatics - The Information School - University of Washington
Date February 18th, 2022
Abstract This project uses global food and COVID-19 data to explore the relationship between diet and COVID-19 mortality. Correlations between these two factors could indicate that (1) healthy diets improve COVID-19 outcomes, (2) countries with enough food are better equipped to combat COVID, or (3) a combination of both.
Keywords dietary health, COVID-19, public health, food
1.0 Introduction Over the last two years, the COVID-19 pandemic has swept across world. During this time, scientists, politicians, and world leaders have been trying to find a way to return to normal. By finding correlations between dietary data and COVID-19 statistics, we hope to gain valuable information on how diet effects COVID-19 mortality. More specifically, we will explore correlations between COVID-19 mortality and nations macronutrient consumption through the COVID-19 Healthy Diet Dataset. By understanding these data, we can better understand dietary health’s relationship with our immune systems.
2.0 Design Situation
3.0 Research questions
4.0 The Dataset This data set, Food Supply Kcal [(Marila Prata, 2020)] (https://www.kaggle.com/mpwolke/food-supply-kcal/data), represents the global population affected by COVID-19. With this, the data set also accounts for the food supply, nutrition values, obesity percentages, malnourishment percentages, and food habits of the countries represented. Among these broad categories the data set exhibits, variables such as COVID-19 deaths, active cases, and recovery cases are also analyzed. Having these variables allows the correlation between global food habits, and global COVID-19 cases more easily understandable, and also puts the correlation into perspective for the audience of the data. The set excludes the variables race and gender. While excluding these do not change the validity of the data, nor do they compromise the purpose, adding variables such as race and gender, could illustrate any disproportionate infection rates based on these two factors, and if they are affected by different diets. This could also show any commonalities between men’s and women’s diets and which gender tends to have a higher infection rate. The data was amassed by Kaggle user Marila Prata, who collected information from sources like, the Food and Agriculture Organization of The United Nations (http://www.fao.org/faostat/en/#home) , the Population Reference Bureau (https://www.prb.org/) , the Johns Hopkins Center for Systems Science and Engineering (https://coronavirus.jhu.edu/map.html) , and the USDA center for Nutrition (https://www.choosemyplate.gov/) . The data was originally collected in the beginning of the COVID-19 pandemic and has been updated since, yet ceased updates in April of 2020. Collecting this data was an attempt at answering the question: what non-pharmaceutical interventions could the population make in order to stop the spread of COVID-19? No funding was involved to collect the data, as the majority of the information is publicly available. Alternatively, those who could benefit, or financially benefit from a data set of this nature would be direct stakeholders such as large healthcare organizations and the government, specifically food and agricultural divisions. The validity of this data is strong. Many of the resources cited come from official United States government websites, and a reputable source, John Hopkins. Besides COVID-19 information collected from Johns Hopkins, there was no data included in the set from a source outside of official, public information. This makes the data easier to trust as it is assumed information that originates from the United States government is honest, and not exclusionary. This data was obtained from Kaggle, an online platform used to publish and find datasets. The source of the data is credited as kaggle, and the aforementioned sources are factual and verifiable.
5.0 Expected Implications Answers to research questions can have a positive impact on all fronts. For professional technologists, the results of the research will allow them to have a more mature and comprehensive concept of nutritional research and they can adjust nutritional balance according to this concept. For designers, more concepts about the nutritional content of food allow them to have a more comprehensive theoretical knowledge to think about and apply in practice. Finally, for policymakers, the results of nutritional data will allow them to set more rules and requirements for the food industry to ensure people’s health for the sake of people’s health. Therefore, during COIVD-19 these actions can boost their immunity through nutritional balance so that people have a stronger body to resist the coronavirus. And during a pandemic, the issue of adequate nutritional balance is critical to combating the new coronavirus, and that’s something policymakers will consider.
6.0 Limitations Possible limitations that we might need to consider include the limited time frame of this data. This data stopped being updated in April 2020, so it really only encompasses the very beginning of the Covid-19 Pandemic which makes it difficult to be certain that nourishment played the largest role in these Covid statistics. Additionally, we miss out on how nourishment played a role in the later spread of the disease as new variants emerged. Furthermore, we have to take into account the different collection biases when comparing countries’ Covid-19 mortality rates. Geographical variations in the Covid-19 strains are another consideration, as the strain of Covid in America is not the same as in Europe and likely has different effects. Lastly, we have to consider how different cultures view food and how limited of a commodity it is in some nations, and how that might affect the data we are analyzing.
Acknowledgements I’d like to appreciate our TA, professor, and my team members because it is through them that we could understand more about data.
References
Appendix A: Questions No questions so far! Thanks for asking :smile:

Summary Information

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Our data set included food consumption and COVID-19 statistics from countries representing people. In these data, we are able to analyze how diet relates to COVID-19 outcomes, and what that implies about population health. Countries combating COVID-19 have median mortality rate of 0.012 and a average mortality rate of 0.018. This indicates that our death rate distribution skews left. As for obesity, countries experience obesity at a median rate of 21.2% and an average rate of 18.7%. This obesity rate distribution skews right. For a better look at our data, here is a summary table consisting of food and COVID-19 data for the 15 largest countries:

Aggregate Table

Country Population Cases Deaths Fatality Ratio (Deaths to Cases) Obesity (%) Malnourished (%) Animal Producs (kg) Vegetable Products (kg)
China 1398030000 125667 4857 0.039 6.6 8.5 13.424 36.574
India 1391885000 42692943 509358 0.012 3.8 14.5 11.336 38.657
United States of America 329153000 77918466 922470 0.012 37.3 <2.5 21.235 28.759
Indonesia 268419000 4807778 145176 0.030 6.9 8.3 6.258 43.744
Pakistan 216565000 1488958 29828 0.020 7.8 20.3 22.276 27.723
Brazil 209332000 27552267 639151 0.023 22.3 <2.5 17.347 32.654
Nigeria 200964000 254016 3141 0.012 7.8 13.4 1.739 48.258
Bangladesh 163667000 1914356 28838 0.015 3.4 14.7 5.193 44.803
Russia 146731000 14102736 334093 0.024 25.7 <2.5 16.152 33.847
Mexico 126577000 5292706 312819 0.059 28.4 3.6 15.153 34.846
Japan 126180000 3975513 20516 0.005 4.4 <2.5 15.319 34.678
Ethiopia 112079000 467575 7426 0.016 3.6 20.6 5.296 44.699
Philippines 108117000 3639942 55094 0.015 6.0 13.3 6.711 43.291
Egypt 99064000 457081 23409 0.051 31.1 4.5 6.737 43.262
Vietnam 95656000 2540273 39037 0.015 2.1 9.3 8.576 41.423

These data reveal that populations with higher COVID-19 mortality rates tend to have higher obesity rates (>20%). The exception to this rule is the United States which has a 37.3% obesity rate, and a 0.012 COVID-19 mortality ratio.

Chart 1 code

#Libraries used
library("dplyr")
library("ggplot2")

#Importing and reading the data files with the global food data (in kilograms), and global COVID data 
global_food_data <- read.csv("../data/global_food_and_covid.csv")

#Grouping Food Supply data set by food category(sweeteners)
food_supply_quantity_kg_data <- global_food_data %>%
  group_by(Sugar...Sweeteners) %>%
  filter(Last_Update == max(Last_Update, na.rm = TRUE))
  
#creating a density map exemplifying the correlation between the percentage 
#cd ~of global sugar consumption and the percentages of deaths from COVID 
food_supply_quantity_kg_data %>%
  ggplot(., aes(x = Sugar...Sweeteners, fill = country_deaths)) +
  geom_density()

Chart Purpose

This density map explores the relationship between global sugar/ sweetener consumption and global Covid-19 deaths. This chart visualizes the ratio of death percentages pertaining to the percentage of sugar consumption

Observation

Observable in the map, there is a clear relationship between percentages of sugar consumption and death percentages.The map exemplifies this by showing that the increase of sweeteners effects the increase in Covid-19 related deaths.

Insight

With this observation, we can see that diets with a higher rate in sugar consumption, can effect a nation/persons susceptibility to Covid-19 exposure and death.

Chart 2 code

<<<<<<< HEAD

#load the package
library(ggplot2)
library(dplyr)
library(leaflet)
library("plotly")
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
#cases and deaths condition in US(scatter plot)
food_and_covid <- read.csv("../data/global_food_and_covid.csv")
covid_specific <- food_and_covid %>%
  select(Country_Region, country_deaths, country_cases, country_fatality_ratio)

chart2 <- plot_ly(
  data = covid_specific, 
  x = ~country_cases, 
  y = ~country_deaths,
  size = ~country_deaths,
  type="scatter"
) %>% layout(title = "COVID-19 Mortality Ratio", 
             xaxis = list(range = c(log10(1000), log10(125000000)), title = "Cases", type = "log"), 
             yaxis = list(title = "Deaths", type = "log")) %>%
  add_trace(
    text = ~Country_Region,
    hoverinfo = c("text"),
    showlegend = F
  )

chart2
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: `line.width` does not currently support multiple values.

## Warning: `line.width` does not currently support multiple values.

Chart Purpose

Considering that the nutritional ratio will have a direct impact on the COVID situation from the perspective of immunity, the purpose of making this graph in the early stage is to count the number of COVID cases and deaths around the world and the ratio between them, so as to compare it with each country. Food nutrient ratios are linked to analyze the impact of nutrition on COVID.

Observation and Insight

According to the scatter plot, in general, almost all countries have a COVID mortality rate below 10%. For some countries with extremely high mortality, such as Belgium(58% death ratio), according to the dataset of Food, the proportion of meat and fruit is far lower than that of countries with low mortality, so it seems that the reason behind this may be because insufficient intake of protein and vitamins leads to decreased immunity.

Chart 3 code

food_global_data <- read.csv("../data/Food_Supply_Quantity_kg_Data.csv")

obesity_deaths <- food_global_data %>%
  select(Obesity, Deaths)

ggplot(obesity_deaths, aes(x=Obesity, fill = Deaths)) + geom_histogram()

Chart Purpose

This histogram explores the relationship between the percentage of global obesity data and the percentages of deaths from COVID-19. Additionally, this chart visualizes the ratio of death percentages compared to the percentage of global obesity.

Observation

While analyzing the histogram, there seems to be distinct relationship between the obesity data and the COVID-19 related deaths. The increase in obesity appears to affect the increase in COVID-19 death data.

Insight

From these observations, we can see that the higher the obesity data is within a certain region or country, can increase how susceptible a individual is to COVID-19 contraction and death.